Adaptive Join Plan Generation in Hadoop For CPS296.1 Course Project

نویسندگان

  • Gang Luo
  • Liang Dong
چکیده

Joins in Hadoop has always been a problem for its users: the Map/Reduce framework seems to be specifically designed for group-by aggregation tasks rather than across-table operations; on the other hand, join operation in distributed database systems was never an easy task because data location and skewness makes join strategies harder to optimize. Fragment-replicate join (map join) may be a clever step towards good performance in some cases, but it can be a dangerous move under certain circumstances. This paper introduces some new techniques used in map join to tackle these issues, and proposes a plan generator for the join types that we currently have.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Joins for Hybrid Warehouses: Exploiting Massive Parallelism in Hadoop and Enterprise Data Warehouses

HDFS has become an important data repository in the enterprise as the center for all business analytics, from SQL queries, machine learning to reporting. At the same time, enterprise data warehouses (EDWs) continue to support critical business analytics. This has created the need for a new generation of special federation between Hadoop-like big data platforms and EDWs, which we call the hybrid...

متن کامل

Dynamic Join Algorithm Switching at Query Execution Time

Join optimization is one of the most challenging tasks in query processing. The performance of joins depends not only on the algebraical/logical query execution plan (QEP), but also on the chosen join algorithms. Static optimization techniques often suffer from outdated or not available statistics on the data. This may result in sub-optimal QEPs and poor query execution times. Adaptive Query Pr...

متن کامل

Heads-Join: Efficient Earth Mover's Distance Similarity Joins on Hadoop

The Earth Mover’s Distance (EMD) similarity join has a number of important applications such as near duplicate image retrieval and distributed based pattern analysis. However, the computational cost of EMD is super cubic and consequently the EMD similarity join operation is prohibitive for datasets of even medium size. We propose to employ the Hadoop platform to speed up the operation. Simply p...

متن کامل

Distributed Adaptive Windowed Stream Join Processing

This paper presents an adaptive framework for processing a window-based multi-way join query over distributed data streams. The framework integrates distributed plan modification and distributed plan migration within the same scope by using a building block called the node operator set (NOS). An NOS is housed in each node that participates in the join execution, and specifies the set of atomic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010